Skip to content

Conversation

@mverzilli
Copy link
Contributor

@mverzilli mverzilli commented Jan 8, 2026

I decided to fragment #19293 into a smaller, more digestible (both for reviewers and for myself) series of PRs.

The end goal is to refactor PXE's stores so they work with "staged writes": every write to a store is now kept in memory segmented by a jobId, and is not written to the underlying KV store until a coordinated commit.

Relevant stores will (in subsequent PRs) implement a new StagedStore interface, which defines the following methods:

  • commit(jobId): when called, moves all the in-data memory corresponding to jobId to the persistent KV store.
  • discardStaged(jobId): clears up any in-memory data structures associated to jobId without persisting.

Read operations can optionally receive a jobId, which affects behavior as follows:

  • If not provided (or undefined): read from KV store ("read committed")
  • If provided: read committed + staged data associated to the jobId (how both sources of data are unified is store-dependent).

A new JobCoordinator class exposes the following methods for PXE's convenience:

  • registerStores(stagedStores: StagedStore[]): makes a collection of stores known to the JobCoordinator.
  • beginJob(): string: called by PXE when a job starts, returns a jobId that then gets threaded through the job's phases.
  • commitJob(jobId): iterates over all registered stores, calling commit(jobId) and wrapped by a transactionAsync call to guarantee that all writes happen in the same KV transaction.
  • abortJob(jobId): same as commitJob, but calling discard.

As a result, any data operations done before PXE decides to commitJob are discarded if PXE fails, process is killed, etc.

This specific PR introduces the JobCoordinator class, and makes PXE jobs use it, and threads jobId's through ContractFunctionSimulator and the oracles from where they will be used as params to store operations.

@mverzilli mverzilli added the ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure label Jan 8, 2026
Copy link
Contributor

@Thunkar Thunkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@mverzilli mverzilli added this pull request to the merge queue Jan 9, 2026
Merged via the queue into next with commit a1b1699 Jan 9, 2026
25 of 26 checks passed
@mverzilli mverzilli deleted the martin/job-coordinator branch January 9, 2026 15:30
Comment on lines +113 to +114
for (const store of this.#stores.values()) {
await store.commit(jobId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do Promise.all? Or do we have reason to believe that these writes cannot be made concurrenty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to the Promise.all, since stores are disjoint. Let me tackle that once the more critical path is done.

* Checks if there's a job currently in progress.
*/
hasJobInProgress(): boolean {
return this.#currentJobId !== undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why receive the job ids in fns (commit and abort) and not expose it? Is it to ensure that the caller is properly tracking the job id they got from begin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought the API more around the possibility of having multiple concurrent jobs, so when/if that time comes #currentJobId would become #currentJobs: string[], and the internal checks would get relaxed, but the external API would look the same.

Not strictly necessary for it to be like that in its current incarnation, but also don't see anything bad with it.

AztecBot pushed a commit that referenced this pull request Jan 12, 2026
Second part of the series started with #19445.

This makes the CapsuleStore work based on staged writes. With this, capsules aren't written to persistent storage until PXE decides to commit the job.
github-merge-queue bot pushed a commit that referenced this pull request Jan 12, 2026
Second part of the series started with #19445.

This makes the CapsuleStore work based on staged writes. With this,
capsules aren't written to persistent storage until PXE decides to
commit the job.
AztecBot pushed a commit that referenced this pull request Jan 14, 2026
Third part of the series started with #19445.

This makes the stores related to tagging synchronization work based on staged writes.
github-merge-queue bot pushed a commit that referenced this pull request Jan 14, 2026
Third part of the series started with
#19445.

This makes the stores related to tagging synchronization work based on
staged writes.
AztecBot pushed a commit that referenced this pull request Jan 14, 2026
Fourth part of the series started with #19445.

This makes the PrivateEventStore work based on staged writes. With this, private events aren't written to persistent storage until PXE decides to commit the job.
AztecBot pushed a commit that referenced this pull request Jan 14, 2026
Fourth part of the series started with #19445.

This makes the PrivateEventStore work based on staged writes. With this, private events aren't written to persistent storage until PXE decides to commit the job.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants